Statistical Phrase-Based Post-Editing
نویسندگان
چکیده
We propose to use a statistical phrasebased machine translation system in a post-editing task: the system takes as input raw machine translation output (from a commercial rule-based MT system), and produces post-edited target-language text. We report on experiments that were performed on data collected in precisely such a setting: pairs of raw MT output and their manually post-edited versions. In our evaluation, the output of our automatic post-editing (APE) system is not only better quality than the rule-based MT (both in terms of the BLEU and TER metrics), it is also better than the output of a stateof-the-art phrase-based MT system used in standalone translation mode. These results indicate that automatic post-editing constitutes a simple and efficient way of combining rule-based and statistical MT technologies.
منابع مشابه
PEPr: Post-Edit Propagation Using Phrase-based Statistical Machine Translation
Translators who work by post-editing machine translation output often find themselves repeatedly correcting the same errors. We propose a method for Post-edit Propagation (PEPr), which learns posteditor corrections and applies them on-thefly to further MT output. Our proposal is based on a phrase-based SMT system, used in an automatic post-editing (APE) setting with online learning. Simulated e...
متن کاملRule-Based Translation with Statistical Phrase-Based Post-Editing
This article describes a machine translation system based on an automatic post-editing strategy: initially translate the input text into the target-language using a rule-based MT system, then automatically post-edit the output using a statistical phrase-based system. An implementation of this approach based on the SYSTRAN and PORTAGE MT systems was used in the shared task of the Second Workshop...
متن کاملImproving Translation Fluency with Search-Based Decoding and a Monolingual Statistical Machine Translation Model for Automatic Post-Editing
The BLEU scores and translation fluency for the current state-of-the-art SMT systems based on IBM models are still too low for publication purposes. The major issue is that stochastically generated sentences hypotheses, produced through a stack decoding process, may not strictly follow the natural target language grammar, since the decoding process is directed by a highly simplified translation...
متن کاملLIUM's statistical machine translation systems for IWSLT 2009
This paper describes the systems developed by the LIUM laboratory for the 2009 IWSLT evaluation. We participated in the Arabic and Chinese/English BTEC tasks. We developed three different systems: a statistical phrase-based system using the Moses toolkit, an Statistical Post-Editing (SPE) system and a hierarchical phrase-based system based on Joshua. A continuous space language model was deploy...
متن کاملCan Statistical Post-Editing with a Small Parallel Corpus Save a Weak MT Engine?
Statistical post-editing has been shown in several studies to increase BLEU score for rule-based MT systems. However, previous studies have relied solely on BLEU and have not conducted further study to determine whether those gains indicated an increase in quality or in score alone. In this work we conduct a human evaluation of statistical post-edited output from a weak rule-based MT system, co...
متن کامل